feat: add backend for an instruction table#1780
feat: add backend for an instruction table#1780thecharlesjenkins wants to merge 5 commits intoriscv:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1780 +/- ##
=======================================
Coverage 72.18% 72.19%
=======================================
Files 52 52
Lines 27744 27744
Branches 6002 6002
=======================================
+ Hits 20028 20029 +1
+ Misses 7716 7715 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I don't believe generated content intended for a GPL project needs to licensed as GPL at creation. It does need to be GPL-compatible, and there is a good description of many licenses and their compatibility with GPL at https://www.gnu.org/licenses/license-list.en.html. As an example, you'll find "MIT license" not only compatible with GPL, but also already present here.
What are "requested instructions"? Put differently, what exactly are you generating? Are you intending to replace existing content, or generate new content? How is it used in Linux? (Is this for instruction single-step?) |
Oh interesting, if it's preferred to have a different license I can do that to.
"requested instructions" for Linux is all instructions but if other projects what to use this table they can take a subset of instructions if they prefer, I was just trying to be general. Here is the patch series to Linux I just sent out to use this table https://lore.kernel.org/linux-riscv/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com/T/#t. Instruction single-step is part of it but there are a ton of places that use manually generated instructions that I swap over to using this table in the series. |
To me, it seems like the "easy button" is to use the same license for content generated from other content as the original content. The YAML files here are BSD-based (described as compatible with GPL at the link above as "The Clear BSD License").
That seems very, very cool! I'd love to look at the script that performs the generation, but it doesn't appear that patch (01/16) made it to the mailing list archives. :-/ |
There is a lot of code that is manually manipulating riscv instructions. Constructing these headers for new instructions is time consuming and error prone. The first patch in this series introduces the instruction definition table along with a script that runs at compile time to create all of the associated instruction manipulation functions. The remaining patches migrate all of the manual instruction code to use these generated functions. The instruction definition table is generated from the riscv-unified-db[1], an open source RVI project for instruction specifications. I am sending this series at the same time as I am sending the patch to the riscv-unified-db for the format of the instruction table. That patch can be accessed on github [2]. A lot of the validation for these changes is from running all possible 32-bit or 16-bit integers through these functions to see that any given instruction will produce to expected result. I give more detail on the test cases in the notes of the first couple of patches. For the KVM patches, I also introduce two test cases to ensure that the instruction manipulation is working as expected, along with a bug fix of the current implemention. [1] https://github.com/riscv/riscv-unified-db [2] riscv/riscv-unified-db#1780 # Describe the purpose of this series. The information you put here # will be used by the project maintainer to make a decision whether # your patches should be reviewed, and in what priority order. Please be # very detailed and link to any relevant discussions or sites that the # maintainer can review to better understand your proposed changes. If you # only have a single patch in your series, the contents of the cover # letter will be appended to the "under-the-cut" portion of the patch. # Lines starting with # will be removed from the cover letter. You can # use them to add notes or reminders to yourself. If you want to use # markdown headers in your cover letter, start the line with ">#". # You can add trailers to the cover letter. Any email addresses found in # these trailers will be added to the addresses specified/generated # during the b4 send stage. You can also run "b4 prep --auto-to-cc" to # auto-populate the To: and Cc: trailers based on the code being # modified. To: Paul Walmsley <pjw@kernel.org> To: Palmer Dabbelt <palmer@dabbelt.com> To: Alexandre Ghiti <alex@ghiti.fr> To: Anup Patel <anup@brainfault.org> To: Atish Patra <atish.patra@linux.dev> To: Conor Dooley <conor@kernel.org> To: Paolo Bonzini <pbonzini@redhat.com> To: Andrew Morton <akpm@linux-foundation.org> To: Shuah Khan <shuah@kernel.org> Cc: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org Cc: kvm-riscv@lists.infradead.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com> --- Changes in v2: - EDITME: describe what is new in this series revision. - EDITME: use bulletpoints and terse descriptions. - Link to v1: https://lore.kernel.org/r/20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 2, "change-id": "20260114-riscv_insn_table-38495495dfd3", "prefixes": [], "history": { "v1": [ "20260407-riscv_insn_table-v1-0-54b4736a1e77@gmail.com" ] } } }
|
I resent the patch and then right after I sent it the original one showed up on the archives so now it is there twice... It might have been throttled since it is a large patch. Here it is https://lore.kernel.org/lkml/20260407-riscv_insn_table-v1-1-54b4736a1e77@gmail.com/ |
ThinkOpenly
left a comment
There was a problem hiding this comment.
One thing we try to do for backends that generate consistent output is to add a simple test that compares newly generated output with a "golden" reference. This can be a bit of a pain to maintain, but I expect we'll have "autofix" fix things up automatically in the near future. This helps if someone, for example, changes the scripts here but thinks the changes are innocuous when they aren't.
Yes I agree that makes sense. However, I am not sure how to add tests in the new generator infrastructure, and how I would go about making a test that wouldn't break each time a new instruction is added. It looks like the only existing test case in this new method is the type linting: . I can mess with it longer, but I don't really know how this is supposed to fit together. |
61274ee to
a1f6fe8
Compare
It's supposed to break when new content is added. We're trying to catch unintentionally added content, and for intentionally added content, to see it is processed as expected. With no representation of the output of the script, there's nothing to test against and script modifications go untested.
I'm thinking more like the testing done in You could take it further, of course:
All of that is a little risky if paths or file names change in the kernel, of course. |
a1f6fe8 to
3d473cf
Compare
|
I added a couple of test cases. One of the test cases is to compare against the full output, and the other ones check that hypothetical instruction match expected output. |
|
I added mocha to mock the instruction objects to improve the testing. How do I trigger the dockerfile to rebuild so that it gets the updated Rakefile and pulls in mocha? |
21cfdff to
982a22d
Compare
982a22d to
39ee8dd
Compare
| puts "All IDL passed type checking" | ||
| end | ||
|
|
||
| task :inst_table_gen do |
There was a problem hiding this comment.
This whole stanza should probably not be in the main Rakefile, but closer to the generator itself, and included here. We have a mechanism already set up well above that automatically pulls in tools/*/tasks.rake. So, I suggest creating a new tools/ruby-gems/tasks.rake and move this content there.
There was a problem hiding this comment.
I don't quite see how this fits together. The tasks.rake was used run the "old" style of the backends, but not to test them. The "testing" for these old backends is simply the CI running the gen command on the backends and seeing if it crashes. I probably should have split the commit up, but in this last commit I also made the tools/ruby-gems/udb-gen/test directory that leverages the minitest framework that I saw is used in other places in the repo and I put my unit test cases for this new backend there.
However, this test case is not a unit test and is instead an integration test so I don't think it should live inside of the generator itself. It is testing the integration between the configs at the root of the directory, the UDB module, and udb-gen module. This block appears to currently the only place where integration tests are written, so if that is not desired I can construct a new place to put integration tests.
There was a problem hiding this comment.
On second thought, I'll add this integration test into udb-gen, it will keep things organized
|
|
||
| # tapioca annotations requires the ext/rbi-central submodule to be initialized. | ||
| if [ ! -f "${UDB_ROOT}/ext/rbi-central/index.json" ]; then | ||
| echo "Initializing ext/rbi-central submodule..." | ||
| git -C "${UDB_ROOT}" submodule update --init ext/rbi-central | ||
| fi | ||
| "${UDB_ROOT}"/bin/bundle exec --gemfile "${UDB_ROOT}"/Gemfile tapioca annotations |
There was a problem hiding this comment.
@dhower-qc and/or @jordancarlin, thoughts here? @thecharlesjenkins explained this change in the commit message a3f4b4f:
Tapioca annotations in rbi-central are community maintained and are not
always correct. Remove them from the chore script to avoid them being
forcefully added by the CI. This was making it impossible to add "mocha"
as a dependency because the rbi-central types are conflicting with the
sorbet-typed versions.
There was a problem hiding this comment.
We discussed this at the UDB PR meeting today. The thought is that this may have crossed paths with some Ruby Gemfile changes. Could you restore this code, rebase, then see if any related issues are no longer present?
|
Whoa. I guess I forgot to actually submit comments from an earlier review session. sigh. |
This new backend that can be generated with "bundle exec ruby bin/container/udb-gen inst-table -o inst_table.txt" is designed to organize the UDB instructions into a table modeled after the syscall table in Linux. The usecase for this is to commit the generated table into Linux which will then use the table to generate the necessary instruction headers. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Tapioca annotations in rbi-central are community maintained and are not always correct. Remove them from the chore script to avoid them being forcefully added by the CI. This was making it impossible to add "mocha" as a dependency because the rbi-central types are conflicting with the sorbet-typed versions. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Add the Mocha gem to use for mocking objects in test cases. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Introduce a test directory for udb-gen and add test cases for the instruction table. These test cases use mock instructions to validate all of the features of the instruction table generation. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Validate the instruction table generation against all existing instructions. This test file will need to be regenerated each time instructions change, but can be useful to catch unexpected changes. Signed-off-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
39ee8dd to
c690247
Compare
| @@ -0,0 +1,2 @@ | |||
| # SPDX-FileCopyrightText: 2025 Contributors to the RISCV UnifiedDB <https://github.com/riscv/riscv-unified-db> | |||
There was a problem hiding this comment.
You are welcome and encouraged to put your own copyright here, if you wish.
Regardless, let's change this to "2026".
| @@ -0,0 +1,24 @@ | |||
| # SPDX-License-Identifier: BSD-3-Clause-Clear | |||
| # Copyright (c) Charlie Jenkins | |||
There was a problem hiding this comment.
If I understand things correctly:
| # Copyright (c) Charlie Jenkins | |
| SPDX-FileCopyrightText: # Copyright (c) Charlie Jenkins |
There was a problem hiding this comment.
Here and elsewhere, of course.
| @gen_dir = Dir.mktmpdir | ||
| @resolver = Udb::Resolver.new( | ||
| Udb.repo_root, | ||
| gen_path_override: Pathname.new(@gen_dir) | ||
| ) |
There was a problem hiding this comment.
Do we delete this temporary directory somewhere?
|
|
||
| namespace :chore do | ||
| namespace :udb_gen do | ||
| task :update_fixtures do |
There was a problem hiding this comment.
Add a description here, please.
This new backend that can be generated with "bundle exec rake gen:inst_table" is designed to organize all of the requested instructions into a table modeled after the syscall table in Linux. The usecase for this is to commit the generated table into Linux which will use the table to generate the necessary instruction headers.
To be able to commit to Linux, I am marking the generated file as GPL-2.0-only and Reuse requires adding the License file to this repo for that.